2023-08-31
Please read these instructions before writing.
Regarding Question 4, Professor Love is the large fellow standing in the front of the room.
The key distinction we’ll make is between
Information that is quantitative describes a quantity.
Continuous variables (can take any value in a range) vs. Discrete variables (limited set of potential values)
We can also distinguish interval (equal distance between values, but zero point is arbitrary) from ratio variables (meaningful zero point.)
Qualitative variables consist of names of categories.
315 people took (essentially) the same survey in the same way.
| Fall | 2019 | 2018 | 2017 | 2016 | 2015 | 2014 | Total |
|---|---|---|---|---|---|---|---|
| n | 61 | 51 | 48 | 64 | 49 | 42 | 315 |
About how many of those 315 surveys caused no problems in recording responses?
| # | Topic | # | Topic |
|---|---|---|---|
| Q1 | glasses |
Q9 | lectures_vs_activities |
| Q2 | english |
Q10 | projects_alone |
| Q3 | stats_so_far |
Q11 | height |
| Q4 | guess_TL_ht |
Q12 | hand_span |
| Q5 | smoke |
Q13 | color |
| Q6 | handedness |
Q14 | sleep |
| Q7 | stats_future |
Q15 | pulse_rate |
| Q8 | haircut |
- | - |
sex rather than glasses.About how many of those 315 surveys caused no problems in recording responses?
What should we do in these cases?
pulse responses, sorted (n = 61, 1 NA) 33 46 48 56 60 60 3 | 3
62 63 65 65 66 66 4 | 68
68 68 68 69 70 70 5 | 6
70 70 70 70 70 70 6 | 002355668889
71 72 72 74 74 74 7 | 00000000122444445666888
74 74 75 76 76 76 8 | 000012445668
78 78 78 80 80 80 9 | 000046
80 81 82 84 84 85 10 | 44
86 86 88 90 90 90 11 | 0
90 94 96 104 104 110
(Thanks, John Tukey )
.csv fileI’ve placed class01_age_guesses_2022-2023.csv on our 431-data page. This includes guesses from 2022 and 2023.
age_guess TibbleClicking on RAW in the 431-data presentation takes us to a (long) URL that contains the raw data in this sheet.
I’ll read in the sheet’s data to a new tibble (a special kind of R data frame) called age_guess using the read_csv() function.
age_guess tibbleWhat do we get?
# A tibble: 92 × 5
student guess1 guess2 actual year
<chr> <dbl> <dbl> <dbl> <dbl>
1 S-2022-01 57 62 55.5 2022
2 S-2022-02 53 53 55.5 2022
3 S-2022-03 50 50 55.5 2022
4 S-2022-04 48 56 55.5 2022
5 S-2022-05 61 NA 55.5 2022
6 S-2022-06 63 63 55.5 2022
7 S-2022-07 67 58 55.5 2022
8 S-2022-08 50 57 55.5 2022
9 S-2022-09 50 50 55.5 2022
10 S-2022-10 43 56 55.5 2022
# ℹ 82 more rows
How many first guesses in each year were less than 56?
guess1 values look like?guess1 values?Change theme, specify bin width rather than number of bins
Add a vertical line at 56 years to show my actual age.
ggplot(age_guess,
aes(x = guess1)) +
geom_histogram(binwidth = 2,
col = "white", fill = "blue") +
geom_vline(aes(xintercept = 56), col = "red") +
theme_bw() +
labs(
x = "First Guess of Dr. Love's Age",
y = "Fall 2022 and 2023 431 students",
title = "Pretty wide range of guesses",
subtitle = "Dr. Love's Actual Age = 55.5 in 2022, 56.5 in 2023")Create two facets, one for 2022 and one for 2023 guesses…
ggplot(age_guess,
aes(x = guess1, fill = factor(year))) +
geom_histogram(binwidth = 2, col = "white") +
theme_bw() +
facet_grid(year ~ .) +
labs(
x = "First Guess of Dr. Love's Age",
y = "# of Students",
title = "Distribution of guesses is a bit older in 2023",
subtitle = "Dr. Love's Actual Age = 55.5 in 2022, 56.5 in 2023") student guess1 guess2 year
Length:92 Min. :40.00 Min. :40.00 Min. :2022
Class :character 1st Qu.:50.00 1st Qu.:52.00 1st Qu.:2022
Mode :character Median :55.00 Median :56.00 Median :2022
Mean :53.67 Mean :55.12 Mean :2022
3rd Qu.:58.00 3rd Qu.:58.00 3rd Qu.:2023
Max. :72.00 Max. :70.00 Max. :2023
NA's :3
NA's : 3 mean in guess2?student not summarized any further?favstats function from the mosaic package min Q1 median Q3 max mean sd n missing
40 50 55 58 72 53.67391 6.200157 92 0
min Q1 median Q3 max mean sd n missing
40 52 56 58 70 55.1236 5.359402 89 3
describe function from the psych packageCreate new variable (change = guess2 - guess1)
What will this look like?
Call:
lm(formula = guess2 ~ guess1, data = age_guess)
Coefficients:
(Intercept) guess1
22.6209 0.6067
lm filters to complete cases by default.ggplot(data = temp, aes(x = guess1, y = guess2)) +
geom_point() +
geom_smooth(method = "loess", formula = y ~ x, col = "blue") +
geom_abline(intercept = 0, slope = 1, col = "red") +
geom_text(x = 40, y = 38, label = "y = x", col = "red") +
labs(x = "First Guess of Love's Age",
y = "Second Guess of Love's Age",
title = "Student Guesses of Dr. Love's Age in 2022 and 2023",
subtitle = "Love's actual age = 55.5 in 2022, 56.5 in 2023") +
theme_bw()class01-guess10ages-2023-08-29 Google Sheet on our Shared Drive.| Group | Correct | Within 2 | Within 5 | Too Low | Too High | |
|---|---|---|---|---|---|---|
| < 0.05 | 2 | 3 | 6 | 3 | 5 | |
| Complexity | 2 | 3 | 6 | 5 | 3 | |
| Mini UN | 0 | 3 | 6 | 3 | 7 | |
| Soon to be R Masters | 0 | 3 | 5 | 5 | 5 | |
| KAPSY | 0 | 2 | 6 | 4 | 6 | |
| R Amateurs | 0 | 2 | 5 | 4 | 6 | |
| Stats R Fun! | 0 | 2 | 4 | 4 | 6 | |
| How old R You | 0 | 1 | 6 | 3 | 7 | |
| Outliers | 0 | 1 | 5 | 4 | 6 |
| Group | Mean Error | SD (Errors) | Median Error | (Min, Max) Error |
|---|---|---|---|---|
| KAPSY | -0.1 | 7.8 | 1.5 | (-13, 11) |
| Stats R Fun! | -0.2 | 6.3 | 2 | (-9, 7) |
| Outliers | 0.4 | 6.9 | 3.5 | (-12, 8) |
| Soon to be R Masters | 0.7 | 6.1 | 0.5 | (-9, 8) |
| R Amateurs | 1.4 | 8.6 | 2 | (-14, 12) |
| Complexity | -1.7 | 6.5 | -1 | (-15, 6) |
| < 0.05 | 1.8 | 7.4 | 1 | (-12, 12) |
| Mini UN | 2.4 | 6.8 | 2.5 | (-10, 14) |
| How Old R You | 2.7 | 5.6 | 3.5 | (-8, 10) |
| Group | Mean AE | Range (AE) | Median AE | MSE |
|---|---|---|---|---|
| Complexity | 4.9 | (0, 15) | 4.5 | 41.5 |
| Soon to be R Masters | 5.1 | (1, 9) | 5.5 | 34.1 |
| How Old R You | 5.3 | (2, 10) | 4.5 | 35.3 |
| Stats R Fun! | 5.4 | (2, 9) | 6 | 35.8 |
| < 0.05 | 5.6 | (0, 12) | 3.5 | 52.8 |
| Mini UN | 5.6 | (1, 14) | 4 | 47.6 |
| Outliers | 6.0 | (2, 12) | 5.5 | 43.6 |
| KAPSY | 6.3 | (1, 13) | 4.5 | 54.9 |
| R Amateurs | 6.8 | (1, 14) | 6 | 69.2 |
temp_url <- "https://raw.githubusercontent.com/THOMASELOVE/431-data/main/data-and-code/class01-photo-age-history-2023.csv"
photos <- read_csv(temp_url, show_col_types = F)
photos# A tibble: 110 × 12
card label age sex facing year mean_guess error abs_error sq_error
<dbl> <chr> <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 Chong 21 M R 2023 27.9 6.9 6.9 47.6
2 2 Archuleta 64 F L 2023 53 -11 11 121
3 3 Mayfield 28 F L 2023 30.2 2.2 2.2 4.84
4 4 Love 14 M L 2023 16 2 2 4
5 5 McGinn 54 F R 2023 61.6 7.6 7.6 57.8
6 6 Chaney 74 M L 2023 72 -2 2 4
7 7 Storm 44 M R 2023 47.3 3.3 3.3 10.9
8 8 Glantz 83 F L 2023 78.6 -4.4 4.4 19.4
9 9 Honey 24 M L 2023 32.9 8.9 8.9 79.2
10 10 Lawson 34 F R 2023 28.8 -5.2 5.2 27.0
# ℹ 100 more rows
# ℹ 2 more variables: `detailed description` <chr>, jpeg <chr>
The large black “X”s in the plot show 2023 results.
431 Class 02 | 2023-08-31 | https://thomaselove.github.io/431-2023/